Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems
نویسندگان
چکیده
Diversity is crucial to reducing the word error rate (WER) when fusing multiple automatic speech recognition (ASR) systems. We present an empirical analysis linking diversity and fusion performance. We transcribed speech from the first 2012 US Presidential debate using multiple ASR systems trained with the Kaldi toolkit. We used the N-best ROVER algorithm to perform hypothesis fusion and measured N-best diversity by the average pairwise WER. We make three key observations. We first note that the WER of the fused hypothesis decreases significantly with increasing diversity of the N-best list. This decrease is greater than the decrease in WER of the oracle hypothesis in the list. N-best lists from systems trained on different data sets are the most diverse and give the lowest WER upon fusion. We then observe that the benefit of diversity depends on the choice of the fusion scheme. We show that confidenceweighted ROVER is able to better exploit diversity than unweighted ROVER and gives lower WERs. We finally explain the above observations by a simple linear relation linking diversity to the ROVER WER. This relation depends on the fusion scheme and also reveals the tradeoff between diversity and average WER of hypotheses in the N-best list.
منابع مشابه
Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملClassifier Ensemble Framework: a Diversity Based Approach
Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...
متن کاملGeneralized Ambiguity Decomposition for Understanding Ensemble Diversity
Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کامل